note that the .Rmd can be found on my personal repo for the corse

Constraining the effects of topography and climate on climate change sensitivity of glaciers in Tibet

The Tibetan Plateau region is home to the greatest amount of non-polar glaciers, which are also highly sensitive to climate change and has recently been experiencing the greatest rates of mass losses. These glaciers are primary freshwater sources to major rivers across Asia, with increased rates of melting increasing the risks of floods and other geohazards to some of the world’s most densely populated regions. Yet the geophysical and atmospheric conditions driving changes in Tibetan plateau glacier mass balance remain relatively understudied, with uncertainties in which variables can best predict glacier mass change. In this analysis I will use three different modelling approaches to assess which topographical and climatological drives are most important in explaining changes in glacier equilibrium line altitude (ELA) and whether these drives can be used to explain glacier ELA changes since the little ice age. The first approach uses a Mixed effects model with random effects, where all variables where chosen based on a priori knowledge gained from the readings and an exploratory correlation plot. Following this I constructed an indentically stuctured hirerachical Baysian model (using the MCMCglm package), to compair the identified importance of variables under a frequentist vs baysian framework. As the exploratory correlation plot showed that many variables displayed high levels of collinearity/multicollinearity, I decided to create a model using principal components from a PCA, to preserve all available variables, while not violating any assumptions of variable collinearity and codependence. Finally, I decided to use an automatic variable selection method (MUMIn), where I included the maximum number of variables possible, within the computational limits of my PC. While I believe that xxx method best models which variables influence ELA and dELA, because xxx, I synthesize the results all three models to judge xxx.

In this analysis I attempted to justify all modelling structure and variable selection decisions made. As I quite unfamiliar with climatic modelling and glacier dynamics, some of these decisions may be sub-optimal. Yet, these decisions were made to the best of my abilities, without going over the top on background research and overall, my models should have a coherent structure and sufficient workflow documentation. In none of the preliminary readings, did I encountered the temperature or precipitation in a specific month singled out to be of particular importance for glacier ELA. Therefore, I decided to exclude all monthly climatic variables and only focus on the aggregated factors. Along with the provided variables, I also included a new variable called temperature variability, as my preliminary readings highlighted its potential importance (Brun, et. al., 2017) and I was able to compute it with the provided dataset.

Multiple sources confirm that overall there is a lack of knowledge on climatic data and their influence on the TP, due to a lack of Permanente meteorological weather stations in the TP, with the influence of Monsoon being aknoweleged (Maussion et. al., 2014).

1.Data import and wrangeling

glacier <- read_csv("~/Documents/humbolt/quantitative_methods/assignments/glacier_sensitivity.csv") %>%
  na.exclude 
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   orientation = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
glacier_scaled <- glacier %>% 
  # adding a new variable for temperature variability; a factor mentioned to be important in multiple papers
  mutate(T_variability = ((T_MAX_mean_monsoon + T_MAX_mean_not_monsoon)-(T_MIN_mean_monsoon + T_MIN_mean_not_monsoon) /2),
          # scaling all varibales used in modelling
         length = scale(length, center = T, scale = T),
         area = scale(area, center = T, scale = T),
         P_snow = scale(P_snow, center = T, scale = T),
         P_year = scale(P_year, center = T, scale = T),
         P_monsoon = scale(P_monsoon, center = T, scale = T),
         P_not_monsoon = scale(P_not_monsoon, center = T, scale = T),
         T_MIN_mean_monsoon = scale(T_MIN_mean_monsoon, center = T, scale = T),
         T_MIN_mean_not_monsoon = scale(T_MIN_mean_not_monsoon, center = T, scale = T),
         T_MAX_mean_monsoon = scale(T_MAX_mean_monsoon, center = T, scale = T),
         T_MAX_mean_not_monsoon = scale(T_MAX_mean_not_monsoon, center = T, scale = T),
         T_variability = scale(T_variability, center = T, scale = T),
         T_mean_mea.yr = scale(T_mean_mea.yr, center = T, scale = T),
         T_mean_monsoon = scale(T_mean_monsoon, center = T, scale = T),
         T_mean_not_monsoon = scale(T_mean_not_monsoon, center = T, scale = T),
         Slope_min = scale(Slope_min, center = T, scale = T),
         Slope_max = scale(Slope_max, center = T, scale = T),
         Slope_mean = scale(Slope_mean, center = T, scale = T),
         Elev_min = scale(Elev_min, center = T, scale = T),
         Elev_max = scale(Elev_max, center = T, scale = T),
         Elev_mean = scale(Elev_mean, center = T, scale = T))




#ggplot(data=glacier, aes_string(x = id, y = "")) + geom_line()

colnames(glacier) 
##  [1] "id"                     "ELA"                    "dELA"                  
##  [4] "summit"                 "debris_cov"             "morph_type"            
##  [7] "orientation"            "length"                 "area"                  
## [10] "P_01_mean"              "P_02_mean"              "P_03_mean"             
## [13] "P_04_mean"              "P_05mean"               "P_06_mean"             
## [16] "P_07_mean"              "P_08_mean"              "P_09_mean"             
## [19] "P_10_mean"              "P_11_mean"              "P_12_mean"             
## [22] "P_snow"                 "P_year"                 "P_monsoon"             
## [25] "P_not_monsoon"          "T_MIN_mean.1"           "T_MIN_mean.2"          
## [28] "T_MIN_mean.3"           "T_MIN_mean.4"           "T_MIN_mean.5"          
## [31] "T_MIN_mean.6"           "T_MIN_mean.7"           "T_MIN_mean.8"          
## [34] "T_MIN_mean.9"           "T_MIN_mean.10"          "T_MIN_mean.11"         
## [37] "T_MIN_mean.12"          "T_MIN_mean_monsoon"     "T_MIN_mean_not_monsoon"
## [40] "T_MAX_mean.1"           "T_MAX_mean.2"           "T_MAX_mean.3"          
## [43] "T_MAX_mean.4"           "T_MAX_mean.5"           "T_MAX_mean.6"          
## [46] "T_MAX_mean.7"           "T_MAX_mean.8"           "T_MAX_mean.9"          
## [49] "T_MAX_mean.10"          "T_MAX_mean.11"          "T_MAX_mean.12"         
## [52] "T_MAX_mean_monsoon"     "T_MAX_mean_not_monsoon" "T_mean_mea.1"          
## [55] "T_mean_mea.2"           "T_mean_mea.3"           "T_mean_mea.4"          
## [58] "T_mean_mea.5"           "T_mean_mea.6"           "T_mean_mea.7"          
## [61] "T_mean_mea.8"           "T_mean_mea.9"           "T_mean_mea.10"         
## [64] "T_mean_mea.11"          "T_mean_mea.12"          "T_mean_mea.yr"         
## [67] "T_mean_monsoon"         "T_mean_not_monsoon"     "Slope_min"             
## [70] "Slope_max"              "Slope_mean"             "Elev_min"              
## [73] "Elev_max"               "Elev_mean"
#(corr_plot <- ggpairs(data = glacier, columns = c(2:9,22:25,38:39,52:53,66:74)))
#corplot?
#corrplot.mixed(cor1[,2:9,22:25,38:39,52:53,66:74], lower.col = "black" , number.cex = .7)

# Here a number of strong correlations between variables can be seen. Notabale is this Summit and Max elevation seem to be the same variable and that xxxx

Q1. Which are the most important topographical and climatological drivers of glacier equilibrium line altitude (ELA)?

Mixed effects model with random effects

In this first section, I chose to create a Mixed effects model with random effects (lme4 package), where all variables where chosen based on a priori knowelge gained from the readings and the exploratory correlation plot. The methodoological advantage of this approach is that through including the random effects, correlations between data coming from specific morphology types and geographical orientations can be accounted for, by treating them as a grouping factor. While this prevents us from explicitly assessing the impacts of morphology and geographic orientation, it still allows us to see how they influence the observed patterns in other varibales. In the final section of this analysis (the automated variable selection) morphology and geographic orientation are left as fixed effects, so that I have them once as a fixed and once as a random effect, to allow for compairson between the two secarios. As the Distribution of ELA had a slight negative skew, I also tried running a GLM with a inverse gaussian distribution (link = "1/mu^2"), but encoured the error that the "PIRLS step-halvings failed to reduce deviance in pwrssUpdate", to which I did not find a solution.

hist(glacier_scaled$ELA) # there is a negative skew to the data, but overall a gaussian distibution looks appropriate

m_lme4 <- lmer(ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +                      Slope_mean + Elev_max + Elev_mean + (1|morph_type) + (1|orientation), 
               data = glacier_scaled) 
summary(m_lme4)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  
##     T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean +  
##     (1 | morph_type) + (1 | orientation)
##    Data: glacier_scaled
## 
## REML criterion at convergence: 9555.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.1407 -0.4238  0.0367  0.4765  6.2845 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  orientation (Intercept)  129.2   11.37   
##  morph_type  (Intercept)  468.9   21.65   
##  Residual                4819.7   69.42   
## Number of obs: 849, groups:  orientation, 8; morph_type, 6
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)   5098.664     12.038 423.553
## debris_cov       2.774     11.248   0.247
## length         -23.132      6.984  -3.312
## area           -21.740      6.526  -3.332
## P_snow          20.095      4.940   4.068
## P_monsoon        8.067      4.476   1.802
## P_not_monsoon  -33.825      7.117  -4.753
## T_mean_mea.yr -101.070     10.953  -9.227
## T_variability   59.366      7.939   7.477
## Slope_mean      -6.581      3.339  -1.971
## Elev_max         8.239      5.691   1.448
## Elev_mean      141.943      7.940  17.876
## 
## Correlation of Fixed Effects:
##             (Intr) dbrs_c length area   P_snow P_mnsn P_nt_m T_mn_. T_vrbl
## debris_cov  -0.096                                                        
## length      -0.059 -0.044                                                 
## area        -0.031 -0.060 -0.722                                          
## P_snow       0.022 -0.004 -0.039  0.069                                   
## P_monsoon   -0.006  0.039 -0.163  0.077  0.016                            
## P_not_monsn -0.026  0.023  0.126 -0.127 -0.713 -0.546                     
## T_mean_m.yr  0.024 -0.081  0.021 -0.085  0.026 -0.047 -0.138              
## T_variablty  0.008 -0.013 -0.058  0.124  0.256  0.115 -0.116 -0.689       
## Slope_mean   0.059  0.005  0.095  0.177  0.066 -0.061  0.025 -0.384  0.177
## Elev_max    -0.012 -0.131 -0.319 -0.191 -0.028  0.010  0.090 -0.092  0.165
## Elev_mean    0.031 -0.016  0.160  0.041 -0.109 -0.094  0.132  0.612 -0.105
##             Slp_mn Elv_mx
## debris_cov               
## length                   
## area                     
## P_snow                   
## P_monsoon                
## P_not_monsn              
## T_mean_m.yr              
## T_variablty              
## Slope_mean               
## Elev_max    -0.477       
## Elev_mean   -0.067 -0.371
# R squared
r.squaredGLMM(m_lme4)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
##            R2m       R2c
## [1,] 0.8996553 0.9107324
# Here we mainly need to focus on the The R²(C), which is 90.78 %. The C represents the Conditional R_GLMM² and can be interpreted as a variance explained by the entire model, including both fixed and random effects


# checking out variable co-lineararity 
plot(m_lme4)

# the residual vs fitted plot shows a mostly flat and even distribution of points, with greater diviances seen at the lower spectrum of points, likely due to the negative skew described above


qqnorm(resid(m_lme4))
qqline(resid(m_lme4))

# When checking the QQ-plot it is seen that there is a left tail at lower bonds and a right tail at the upper bonds. Following an ad hoc transformation of the dependant variable ELA, this observed trend only showed a very marginal improvment, so I decided to keep ELA in its untransformed state.  


# Variable effect size visualization
dwplot(m_lme4) +
   geom_vline(xintercept = 0, colour = "grey60", linetype = 2)

ggcoefstats(x = m_lme4, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that mean elevation and mean annual temperature are the two most influencial variables at predicting ELA. These are then followed by percipitation in non-monsoon phases, aswell as the temperature vairability varable that I aggrigated from mean temperature data. 
hist(glacier_scaled$dELA)

# there is a stronger skew to the data. While again an inverse Gaussian distribution seems approaprite, this was also not possible. Instead I compaired a the model with a normal Gaussian and a Possion distribution and found that the Possion had a better distribution of residuals.

m_lme4_d <- glmer(dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +                      Slope_mean + Elev_max + Elev_mean + (1|morph_type) + (1|orientation), 
                family = poisson  ,data = glacier_scaled) 
summary(m_lme4_d)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: 
## dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  
##     T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean +  
##     (1 | morph_type) + (1 | orientation)
##    Data: glacier_scaled
## 
##      AIC      BIC   logLik deviance df.resid 
##  26667.4  26733.8 -13319.7  26639.4      835 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -9.4921 -3.6701 -0.7578  2.7556 30.8456 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  orientation (Intercept) 0.01299  0.1140  
##  morph_type  (Intercept) 0.67998  0.8246  
## Number of obs: 849, groups:  orientation, 8; morph_type, 6
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    4.297871   0.342767  12.539  < 2e-16 ***
## debris_cov    -0.287961   0.016273 -17.696  < 2e-16 ***
## length         0.022230   0.009986   2.226 0.026009 *  
## area          -0.004885   0.009229  -0.529 0.596575    
## P_snow         0.034998   0.007133   4.907 9.27e-07 ***
## P_monsoon      0.047429   0.006502   7.294 3.00e-13 ***
## P_not_monsoon -0.104107   0.010147 -10.260  < 2e-16 ***
## T_mean_mea.yr  0.230208   0.014964  15.384  < 2e-16 ***
## T_variability -0.022877   0.010742  -2.130 0.033192 *  
## Slope_mean     0.091553   0.004734  19.339  < 2e-16 ***
## Elev_max       0.059419   0.007670   7.747 9.41e-15 ***
## Elev_mean      0.038329   0.010282   3.728 0.000193 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) dbrs_c length area   P_snow P_mnsn P_nt_m T_mn_. T_vrbl
## debris_cov  -0.002                                                        
## length      -0.007 -0.040                                                 
## area        -0.002 -0.071 -0.706                                          
## P_snow       0.004 -0.015 -0.039  0.075                                   
## P_monsoon    0.002  0.052 -0.184  0.067  0.010                            
## P_not_monsn -0.005  0.028  0.151 -0.135 -0.718 -0.552                     
## T_mean_m.yr  0.001 -0.093 -0.020 -0.080  0.011 -0.036 -0.105              
## T_variablty  0.000 -0.017 -0.047  0.125  0.254  0.168 -0.175 -0.736       
## Slope_mean   0.003  0.005  0.119  0.167  0.078 -0.101  0.023 -0.419  0.215
## Elev_max    -0.001 -0.105 -0.322 -0.194 -0.034  0.047  0.063 -0.049  0.129
## Elev_mean    0.000 -0.052  0.134  0.040 -0.149 -0.039  0.145  0.605 -0.181
##             Slp_mn Elv_mx
## debris_cov               
## length                   
## area                     
## P_snow                   
## P_monsoon                
## P_not_monsn              
## T_mean_m.yr              
## T_variablty              
## Slope_mean               
## Elev_max    -0.482       
## Elev_mean   -0.097 -0.359
# R squared (conditional)
r.squaredGLMM(m_lme4_d)
## Warning: The null model is correct only if all variables used by the original
## model remain unchanged.
##                  R2m       R2c
## delta     0.04899643 0.9864697
## lognormal 0.04899972 0.9865360
## trigamma  0.04899310 0.9864028
# The R²(C) is 98.65 %, where the C represents the Conditional R_GLMM² and can be interpreted as a variance explained by the entire model, including both fixed and random effects


# checking out variable co-lineararity 
plot(m_lme4_d)

# the residual vs fitted plot shows a mostly flat and even distribution of points, with some outlying point at intermidiate values

qqnorm(resid(m_lme4_d))

# When checking the QQ-plot it is seen that there is slight tail at the upper bound, but overall this is a significant improvement over the gaussian distribution.


# Variable effect size visualization
dwplot(m_lme4_d) +
   geom_vline(xintercept = 0, colour = "grey60", linetype = 2)

ggcoefstats(x = m_lme4_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that as before (normal ELA) mean annual temperature is an highly influencial variable at predicting ELA. This is followed by followed by the previouly identified varaible of percipitation in non-monsoon phases, aswell as the new debris cover variable. 

Bayesian Models

The default prior values assumed in a MCMCglmm are posterior distribution with very large variance values for the fixed effects and a flat (weakly informative) prior. The variances for the random effects are assumed to have inverse-Wishart priors with a very low value for nu (weakly informative). I decided to not dealve deeper into assigning my own priors, due to me having i) little prior knowlege the dyamics / expected values and distributions of variables and ii) relatively shallow knowelge of the mathimatical underpinnings and practical inplimentation methods for assigning more informative priors

m_bays <- MCMCglmm(ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +                      Slope_mean + Elev_max + Elev_mean, random = ~ morph_type + orientation, 
               data = glacier_scaled) 
## Warning: Unknown or uninitialised column: `family`.
## Warning: Setting row names on a tibble is deprecated.
## 
##                        MCMC iteration = 0
## 
##                        MCMC iteration = 1000
## 
##                        MCMC iteration = 2000
## 
##                        MCMC iteration = 3000
## 
##                        MCMC iteration = 4000
## 
##                        MCMC iteration = 5000
## 
##                        MCMC iteration = 6000
## 
##                        MCMC iteration = 7000
## 
##                        MCMC iteration = 8000
## 
##                        MCMC iteration = 9000
## 
##                        MCMC iteration = 10000
## 
##                        MCMC iteration = 11000
## 
##                        MCMC iteration = 12000
## 
##                        MCMC iteration = 13000
summary(m_bays)
## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 9643.38 
## 
##  G-structure:  ~morph_type
## 
##            post.mean  l-95% CI u-95% CI eff.samp
## morph_type     593.8 5.687e-13     2200    228.5
## 
##                ~orientation
## 
##             post.mean  l-95% CI u-95% CI eff.samp
## orientation     99.24 8.156e-17      357     31.2
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units      4895     4393     5323     1000
## 
##  Location effects: ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon + T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean 
## 
##               post.mean  l-95% CI  u-95% CI eff.samp  pMCMC    
## (Intercept)   5100.3810 5079.0857 5128.6128   885.34 <0.001 ***
## debris_cov       4.8727  -19.3336   25.2913  1000.00  0.674    
## length         -24.5246  -38.2850   -9.1072   828.97 <0.001 ***
## area           -22.2857  -35.9854   -9.7268  1000.00 <0.001 ***
## P_snow          20.0804   10.7564   30.6291  1000.00 <0.001 ***
## P_monsoon        9.4847   -0.1263   19.2150    89.91  0.062 .  
## P_not_monsoon  -34.5102  -48.4661  -19.9021  1000.00 <0.001 ***
## T_mean_mea.yr  -96.0174 -119.1528  -74.3069    42.92 <0.001 ***
## T_variability   56.9360   42.2042   73.7430   329.41 <0.001 ***
## Slope_mean      -7.0984  -13.5776   -0.2236   488.94  0.034 *  
## Elev_max         9.0663   -2.4500   19.9947   828.84  0.126    
## Elev_mean      144.3644  126.2522  159.6352   120.16 <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# R squared
#r.squaredGLMM(m_bays)
# Unfortunatly I was not able to figure out access using the r2 with a predefined function, or an official way to cumpute it. Therefore For me bayesian models I do not include r2 values.


# postirior distribution of random effects
par(mfrow = c(1,2))
hist(mcmc(m_bays$VCV)[,"morph_type"])
hist(mcmc(m_bays$VCV)[,"orientation"])

# The variance can not be equal to zero, but as the mean value is up afainst zero, this can be interpreted as the effect not being of great significance. As the spread of the histograms are reletively narrow, it can be assumed that the distribution is relatively accurate.


# Assesing model convergence through plotting trace and density est. for the intercept
plot(m_bays$Sol)

# The trace can be interpreted as the a psudo timeseries of what the model was doing in each of its iterations. As all traces have the fuzzy caterpillar appearance, it can be assumed there is adequate mixing with overall convergence being met. The density plot shows the postierer distribution of each the model predicted for each variable in each itteration. If the distribution crosses zero, the variable can be assumed to be non-significant and the spread of the distribution convays the overall accuracy of the predicted variable. 


# Variable effect size visualization
ggcoefstats(x = m_bays, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", robust = TRUE, exclude.intercept = T) 
## Warning: Can't extract 'deviance' residuals. Returning response residuals.

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that as before (Hierachrical GLM), the mean elevation and mean annual temperature are highly influencial variables at predicting ELA. This is followed by followed by the also previouly identified varaibles of temperature variability and percipitation in non-monsoon phases. The correspondance between the hinerarchical GLM and baysian model in terms of the magitude of how varaibles influence ELA makes intutive sense, as largely the models largely share the same hirearchical structure and only slighly deviate in how effect sizes and standard errors are calcuated. The lack of major differences between the two models is also accentuated by the fact the only weak (non-informative) priors were selected for the baysian model.
m_bays_d <- MCMCglmm(dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability + Slope_mean + Elev_max + Elev_mean, random = ~ morph_type + orientation,  data = glacier_scaled) 
## Warning: Unknown or uninitialised column: `family`.
## Warning: Setting row names on a tibble is deprecated.
## 
##                        MCMC iteration = 0
## 
##                        MCMC iteration = 1000
## 
##                        MCMC iteration = 2000
## 
##                        MCMC iteration = 3000
## 
##                        MCMC iteration = 4000
## 
##                        MCMC iteration = 5000
## 
##                        MCMC iteration = 6000
## 
##                        MCMC iteration = 7000
## 
##                        MCMC iteration = 8000
## 
##                        MCMC iteration = 9000
## 
##                        MCMC iteration = 10000
## 
##                        MCMC iteration = 11000
## 
##                        MCMC iteration = 12000
## 
##                        MCMC iteration = 13000
summary(m_bays_d)
## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 9191.608 
## 
##  G-structure:  ~morph_type
## 
##            post.mean  l-95% CI u-95% CI eff.samp
## morph_type     2.747 9.264e-17    5.499     1000
## 
##                ~orientation
## 
##             post.mean  l-95% CI u-95% CI eff.samp
## orientation     83.21 1.496e-07    275.8    444.5
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units      2879     2617     3154     1000
## 
##  Location effects: dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon + T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean 
## 
##               post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)    103.1621  95.2098 110.2154    815.6 <0.001 ***
## debris_cov     -29.2438 -46.2934 -12.4820   1000.0  0.002 ** 
## length           0.3298  -9.5330  10.1468   1000.0  0.960    
## area            -1.5623 -11.5572   7.7611   1000.0  0.762    
## P_snow           3.3360  -4.4622  10.0824   1000.0  0.378    
## P_monsoon        8.1003   0.8104  15.0584    570.2  0.030 *  
## P_not_monsoon  -11.5694 -21.4997  -0.9061   1000.0  0.046 *  
## T_mean_mea.yr   34.2254  16.6833  52.6833    556.8 <0.001 ***
## T_variability   -4.4843 -16.6256   7.7295    888.4  0.496    
## Slope_mean       7.4271   2.1189  12.2345    729.2  0.006 ** 
## Elev_max         8.7822   0.1611  17.7915   1000.0  0.050 .  
## Elev_mean       10.9833  -1.4953  23.0228    570.4  0.088 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# R squared
#r.squaredGLMM(m_bays_d)
# Unfortunatly I was not able to figure out access using the r2 with a predefined function, or an official way to cumpute it. Therefore For me bayesian models I do not include r2 values.

# postirior distribution of random effects
par(mfrow = c(1,2))
hist(mcmc(m_bays_d$VCV)[,"morph_type"])
hist(mcmc(m_bays_d$VCV)[,"orientation"])

# The variance can not be equal to zero, but as the mean value is up afainst zero, this can be interpreted as the effect not being of great significance. As the spread of the histograms are reletively narrow, it can be assumed that the distribution is relatively accurate.

# Assesing model convergence through plotting trace and density est. for the intercept
plot(m_bays_d$Sol)

# The trace can be interpreted as the a psudo timeseries of what the model was doing in each of its iterations. As all traces have the fuzzy caterpillar appearance, it can be assumed there is adequate mixing with overall convergence being met. The density plot shows the postierer distribution of each the model predicted for each variable in each itteration. If the distribution crosses zero, the variable can be assumed to be non-significant and the spread of the distribution convays the overall accuracy of the predicted variable. 



# Variable effect size visualization
ggcoefstats(x = m_bays_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", robust = TRUE, exclude.intercept = T) 
## Warning: Can't extract 'deviance' residuals. Returning response residuals.

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that as before (Hierachrical GLM), the mean elevation and mean annual temperature are highly influencial variables at predicting ELA. This is followed by followed by the also previouly identified varaibles of temperature variability and percipitation in non-monsoon phases. The correspondance between the hinerarchical GLM and baysian model in terms of the magitude of how varaibles influence ELA makes intutive sense, as largely the models largely share the same hirearchical structure and only slighly deviate in how effect sizes and standard errors are calcuated. The lack of major differences between the two models is also accentuated by the fact the only weak (non-informative) priors were selected for the baysian model.

Q1b PCA

# pca # need to redo analysis/description of pca plots
pca <- glacier_scaled[c(4:9,22:25,38:39,52:53,66:75)] %>% 
  dplyr::select(-orientation) # need to remove as it is a character
str(pca)
## tibble[,23] [849 × 23] (S3: tbl_df/tbl/data.frame)
##  $ summit                : num [1:849] 5710 5750 5715 5760 5675 ...
##  $ debris_cov            : num [1:849] 0 0 0 0 0 0 0 0 0 0 ...
##  $ morph_type            : num [1:849] 2 2 2 3 2 2 2 2 2 2 ...
##  $ length                : num [1:849, 1] -0.386 -0.284 -0.548 -0.579 0.295 ...
##   ..- attr(*, "scaled:center")= num 6216
##   ..- attr(*, "scaled:scale")= num 5539
##  $ area                  : num [1:849, 1] -0.3439 -0.2619 -0.4021 -0.4901 0.0947 ...
##   ..- attr(*, "scaled:center")= num 1393269
##   ..- attr(*, "scaled:scale")= num 2117729
##  $ P_snow                : num [1:849, 1] -0.5453 -0.2486 -0.0552 0.9086 0.1799 ...
##   ..- attr(*, "scaled:center")= num 120
##   ..- attr(*, "scaled:scale")= num 50.7
##  $ P_year                : num [1:849, 1] -1.211 -0.811 -0.735 -0.782 -0.027 ...
##   ..- attr(*, "scaled:center")= num 889
##   ..- attr(*, "scaled:scale")= num 177
##  $ P_monsoon             : num [1:849, 1] -1.121 -0.593 -0.599 -0.564 0.481 ...
##   ..- attr(*, "scaled:center")= num 638
##   ..- attr(*, "scaled:scale")= num 94.4
##  $ P_not_monsoon         : num [1:849, 1] -1.162 -0.936 -0.788 -0.912 -0.538 ...
##   ..- attr(*, "scaled:center")= num 251
##   ..- attr(*, "scaled:scale")= num 93.3
##  $ T_MIN_mean_monsoon    : num [1:849, 1] -1.18 -1.59 -1.41 -1.72 -1.43 ...
##   ..- attr(*, "scaled:center")= num 1.13
##   ..- attr(*, "scaled:scale")= num 1.08
##  $ T_MIN_mean_not_monsoon: num [1:849, 1] -1.24 -1.61 -1.43 -1.7 -1.47 ...
##   ..- attr(*, "scaled:center")= num -11.9
##   ..- attr(*, "scaled:scale")= num 1.3
##  $ T_MAX_mean_monsoon    : num [1:849, 1] -1.08 -1.59 -1.38 -1.75 -1.36 ...
##   ..- attr(*, "scaled:center")= num 7.81
##   ..- attr(*, "scaled:scale")= num 0.892
##  $ T_MAX_mean_not_monsoon: num [1:849, 1] -1.16 -1.62 -1.41 -1.76 -1.46 ...
##   ..- attr(*, "scaled:center")= num -2.75
##   ..- attr(*, "scaled:scale")= num 1.04
##  $ T_mean_mea.yr         : num [1:849, 1] -1.18 -1.61 -1.42 -1.74 -1.45 ...
##   ..- attr(*, "scaled:center")= num -2.4
##   ..- attr(*, "scaled:scale")= num 1.08
##  $ T_mean_monsoon        : num [1:849, 1] -1.13 -1.6 -1.41 -1.74 -1.4 ...
##   ..- attr(*, "scaled:center")= num 4.47
##   ..- attr(*, "scaled:scale")= num 0.983
##  $ T_mean_not_monsoon    : num [1:849, 1] -1.21 -1.61 -1.43 -1.74 -1.47 ...
##   ..- attr(*, "scaled:center")= num -7.31
##   ..- attr(*, "scaled:scale")= num 1.16
##  $ Slope_min             : num [1:849, 1] 0.69002 -0.16035 0.41288 0.00442 -0.79263 ...
##   ..- attr(*, "scaled:center")= num 8.45
##   ..- attr(*, "scaled:scale")= num 6.45
##  $ Slope_max             : num [1:849, 1] -0.138 -0.748 -0.6 -0.855 -1.09 ...
##   ..- attr(*, "scaled:center")= num 46.2
##   ..- attr(*, "scaled:scale")= num 10.4
##  $ Slope_mean            : num [1:849, 1] -0.359 -0.925 -0.342 -0.431 -1.285 ...
##   ..- attr(*, "scaled:center")= num 27.5
##   ..- attr(*, "scaled:scale")= num 6.44
##  $ Elev_min              : num [1:849, 1] 0.837 0.901 1.071 1.336 0.896 ...
##   ..- attr(*, "scaled:center")= num 4800
##   ..- attr(*, "scaled:scale")= num 407
##  $ Elev_max              : num [1:849, 1] 0.641 0.86 0.699 0.991 0.548 ...
##   ..- attr(*, "scaled:center")= num 5505
##   ..- attr(*, "scaled:scale")= num 292
##  $ Elev_mean             : num [1:849, 1] 0.994 1.377 1.464 1.71 1.191 ...
##   ..- attr(*, "scaled:center")= num 5155
##   ..- attr(*, "scaled:scale")= num 221
##  $ T_variability         : num [1:849, 1] -0.929 -1.527 -1.286 -1.73 -1.291 ...
##   ..- attr(*, "scaled:center")= num 10.4
##   ..- attr(*, "scaled:scale")= num 0.782
##  - attr(*, "na.action")= 'exclude' Named int [1:4] 731 739 793 829
##   ..- attr(*, "names")= chr [1:4] "731" "739" "793" "829"
# plot of pc1 and pc2
res.pca <- PCA(pca[,c(1:22)], scale.unit = TRUE, ncp = 5, graph = TRUE) 

## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# based on this PCA we can intially see that elev mean is closly aligned with PC1 and has a strong negative orientation, while being diamertically opposed to the agregated temperature and precipitation measures. Geo-physical conditions (aside from elev mean on PC1) tend to be ordinated on PC2 with the strongers factors being glacier area, length, and summit height/max elevation (the same redundent metric...). 


#plot of pc1 and pc3
res.pca <- PCA(pca[,c(1:22)], scale.unit = TRUE, ncp = 5, graph = TRUE, axes = c(1,3))

## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

# based on this plot of PC1 and PC2, we can see that snow precipitation is the main variable contributing to PC3s explained variation. Furthermore the ordination of variables on PC1 are only slightly altered in compairoson to the plot of PC1 and PC2. 


pca <- prcomp(pca, scale. = T)
summary(pca)
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.239 2.1197 1.7267 1.26260 1.03151 0.82279 0.68121
## Proportion of Variance 0.456 0.1954 0.1296 0.06931 0.04626 0.02943 0.02018
## Cumulative Proportion  0.456 0.6514 0.7810 0.85032 0.89658 0.92601 0.94619
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.57066 0.52564 0.41415 0.37528 0.32194 0.28148 0.25031
## Proportion of Variance 0.01416 0.01201 0.00746 0.00612 0.00451 0.00344 0.00272
## Cumulative Proportion  0.96035 0.97236 0.97982 0.98594 0.99045 0.99389 0.99662
##                           PC15    PC16    PC17    PC18     PC19     PC20
## Standard deviation     0.20272 0.15590 0.11041 0.01307 0.006805 0.004639
## Proportion of Variance 0.00179 0.00106 0.00053 0.00001 0.000000 0.000000
## Cumulative Proportion  0.99840 0.99946 0.99999 1.00000 1.000000 1.000000
##                            PC21      PC22      PC23
## Standard deviation     9.62e-08 9.493e-16 5.442e-16
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00
## Cumulative Proportion  1.00e+00 1.000e+00 1.000e+00
pca$rotation[,1:6]
##                                  PC1          PC2          PC3           PC4
## summit                 -0.1156595114 -0.396530022  0.006722996 -0.1575511536
## debris_cov              0.0592119096 -0.269462680 -0.052365271  0.0841236946
## morph_type              0.0400162434 -0.091990076 -0.029584626  0.0998727770
## length                  0.0483474510 -0.419110439  0.066102380  0.2009610901
## area                    0.0498102447 -0.403993717  0.069942759  0.2063360677
## P_snow                  0.0092334579  0.044215327  0.512080560 -0.1067217199
## P_year                  0.1663450945  0.026036585  0.473270085 -0.0983638200
## P_monsoon               0.1149693609 -0.015678340  0.482582772 -0.1269818451
## P_not_monsoon           0.1989915206  0.065216422  0.408839901 -0.0579761041
## T_MIN_mean_monsoon      0.3060541207  0.012778292 -0.053013928  0.0008543518
## T_MIN_mean_not_monsoon  0.3032010943 -0.016791648 -0.044913670 -0.0236876350
## T_MAX_mean_monsoon      0.2981733398  0.062127235 -0.096162241  0.0460564355
## T_MAX_mean_not_monsoon  0.3038968050  0.008468283 -0.074287604  0.0015814723
## T_mean_mea.yr           0.3054852777  0.009768718 -0.064004869  0.0001986579
## T_mean_monsoon          0.3041721614  0.035272945 -0.072790921  0.0211532886
## T_mean_not_monsoon      0.3047792037 -0.005670976 -0.058388518 -0.0124484181
## Slope_min              -0.0008673191  0.172349229 -0.144086417 -0.5497905945
## Slope_max               0.1074204433 -0.340938679 -0.052861818 -0.2748845202
## Slope_mean              0.0969690782 -0.129444291 -0.151594973 -0.6427029063
## Elev_min               -0.2245285603  0.284072988  0.023014192 -0.0242427371
## Elev_max               -0.1272311906 -0.398418302  0.024459017 -0.1469850576
## Elev_mean              -0.2903997157 -0.014890301  0.052015325 -0.1132050921
## T_variability           0.2804581157  0.087127203 -0.134391044  0.0736330210
##                                  PC5          PC6
## summit                  0.0824178265 -0.013314116
## debris_cov             -0.1631885443 -0.921362046
## morph_type             -0.8875920506  0.231071412
## length                 -0.0376147737  0.144502015
## area                   -0.1262642924  0.122834120
## P_snow                 -0.1045303257 -0.040817253
## P_year                 -0.0006254728 -0.028488009
## P_monsoon               0.0538899676 -0.034703229
## P_not_monsoon          -0.0557100478 -0.018888474
## T_MIN_mean_monsoon      0.0092062920 -0.004116058
## T_MIN_mean_not_monsoon  0.0024765380  0.003884554
## T_MAX_mean_monsoon      0.0178121599 -0.010498486
## T_MAX_mean_not_monsoon  0.0277241682 -0.001477082
## T_mean_mea.yr           0.0136610122 -0.001637198
## T_mean_monsoon          0.0131736061 -0.006805675
## T_mean_not_monsoon      0.0138881375  0.001490011
## Slope_min              -0.3032269288 -0.114844999
## Slope_max               0.1929150277  0.181647499
## Slope_mean             -0.0472596556  0.047772626
## Elev_min               -0.0550247898 -0.073168174
## Elev_max                0.0741599741  0.021785456
## Elev_mean              -0.0209235332 -0.053078593
## T_variability           0.0487193561 -0.014295989
# based on this roation, I will make the following assumption for which variables are analogs to each PC
# PC1: All tempurature variables (both monsoon & non-monsoon), tempurature variability, and elevation mean
# PC2: Geophysical conditions such as min/max elevation, debris cover, catchment length and area, and max slope
# PC3: All precipitation variables 
# PC4: Slope, and to a far lesser degree catchment area, length and elevation
# PC5: Morphology type 


var_exp <- data.frame(pc = c(1:23),
                      var_exp = pca$sdev^2 / sum(pca$sdev^2))

# add the variances cumulatively
var_exp$var_exp_cumsum <- cumsum(var_exp$var_exp)
var_exp
##    pc      var_exp var_exp_cumsum
## 1   1 4.560124e-01      0.4560124
## 2   2 1.953597e-01      0.6513721
## 3   3 1.296334e-01      0.7810055
## 4   4 6.931112e-02      0.8503167
## 5   5 4.626144e-02      0.8965781
## 6   6 2.943396e-02      0.9260121
## 7   7 2.017606e-02      0.9461881
## 8   8 1.415870e-02      0.9603468
## 9   9 1.201310e-02      0.9723599
## 10 10 7.457546e-03      0.9798175
## 11 11 6.123134e-03      0.9859406
## 12 12 4.506432e-03      0.9904470
## 13 13 3.444813e-03      0.9938918
## 14 14 2.724236e-03      0.9966161
## 15 15 1.786818e-03      0.9984029
## 16 16 1.056764e-03      0.9994597
## 17 17 5.299700e-04      0.9999896
## 18 18 7.426372e-06      0.9999971
## 19 19 2.013656e-06      0.9999991
## 20 20 9.358421e-07      1.0000000
## 21 21 4.023442e-16      1.0000000
## 22 22 3.917793e-32      1.0000000
## 23 23 1.287748e-32      1.0000000
ggplot(var_exp) +
  geom_bar(aes(x = pc, y = var_exp), stat = "identity") +
  geom_line(aes(x = pc, y = var_exp_cumsum)) +
  theme_classic()

# Following the Elbow method, it seems like 3-5 PCs seems like an approapriate cutoff point. Despite being more difficult to interpret the final results, I decided to include 5 PCs, due to PCs 4 and 5 increasing the overall variance explained from ~80% to 90%. This also helps to ensure that the large number of variables in the model are captured in the PCs, especially as morphology type almost exclusively loads on PC5.

# scores
pca_scores <- data.frame(pca$x)
head(pca_scores)
##         PC1         PC2         PC3        PC4        PC5         PC6
## 1 -4.085950  0.12736786 -1.15488972 -0.4276191  0.8461927 -0.19220332
## 2 -5.283278 -0.02000121  0.10140922  0.2929743  0.9265436 -0.25750181
## 3 -4.742488  0.34258840 -0.03086408 -0.5009767  0.7503837 -0.36280031
## 4 -5.739056  0.13114428  0.67963939 -0.2685366 -0.6152369 -0.08876674
## 5 -4.418274 -0.03063813  1.46720119  1.0095502  0.9809866 -0.22197371
## 6 -3.354310  0.37954311  0.86186526  0.4488272  1.1837522 -0.16445739
##          PC7        PC8         PC9         PC10        PC11         PC12
## 1 -0.5924383  0.3157648 -0.06914327  0.040799740 -0.33150857  0.003904768
## 2 -0.4260974 -0.0194212  0.07569821 -0.284495967 -0.08137097 -0.125315391
## 3 -0.4281015  0.1150437  0.10357952  0.006878345  0.01024527 -0.130232249
## 4  0.3062982  0.4004047  0.59410478 -0.129178425  0.05062106  0.089638527
## 5 -0.4874418 -0.4850412 -0.13692691 -0.028121838  0.24060972 -0.082023890
## 6  0.2822511 -0.2910021 -0.38920635 -0.053532077  0.18451985 -0.127911367
##          PC13         PC14        PC15        PC16        PC17        PC18
## 1 -0.06771404 -0.007749453  0.03161404 -0.08157295 -0.02674952 -0.02366540
## 2 -0.12582001  0.104398650  0.04780135 -0.07585605 -0.01790055 -0.01942436
## 3 -0.16104792  0.015764018  0.06123426 -0.03189586 -0.02538805 -0.01376238
## 4  0.16573447 -0.050168883  0.09239933 -0.16075389 -0.01365135 -0.02749751
## 5  0.19389144  0.139088085 -0.01502534 -0.08903185  0.07761670 -0.01844080
## 6  0.31287542 -0.240958925 -0.19863270  0.03489515  0.11186253 -0.01939790
##           PC19          PC20          PC21          PC22          PC23
## 1  0.008769504  0.0026651323 -2.861286e-08 -5.551115e-17  1.387779e-16
## 2  0.001274638  0.0082995976  7.317158e-08 -2.220446e-16  2.220446e-16
## 3 -0.002945277  0.0004759622 -4.582566e-08  2.220446e-16  3.608225e-16
## 4  0.001010093 -0.0047553419  2.704832e-07 -1.110223e-16  3.053113e-16
## 5  0.003278927 -0.0033848377  1.013831e-07 -6.661338e-16 -1.249001e-16
## 6  0.003915545 -0.0039411928 -2.274929e-07 -1.110223e-16 -8.326673e-17
pca_scores_df <- cbind(glacier_scaled, pca_scores)
#ggpairs(pca_scores_df[,c(4:6,8:9,22:25,38:39,52:53,66:75,76:80)]) # first 5 Pcs

m_pca <- glm(ELA ~ PC1 + PC2 + PC3 + PC4 + PC5, data = pca_scores_df)
summary(m_pca)
## 
## Call:
## glm(formula = ELA ~ PC1 + PC2 + PC3 + PC4 + PC5, data = pca_scores_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -365.98   -41.14    -3.62    41.31   562.01  
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 5104.9305     2.8600 1784.941  < 2e-16 ***
## PC1          -65.1737     0.8836  -73.757  < 2e-16 ***
## PC2           17.7810     1.3500   13.171  < 2e-16 ***
## PC3            4.7192     1.6573    2.848  0.00451 ** 
## PC4          -24.0376     2.2665  -10.606  < 2e-16 ***
## PC5           -4.2597     2.7743   -1.535  0.12506    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 6944.475)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  5854193  on 843  degrees of freedom
## AIC: 9927.3
## 
## Number of Fisher Scoring iterations: 2
# Here we see that PCs 1,2, and 4 have the strongest effect size and large predictive power for ELA. Based on the caracterization of which variables are represented by each PC, this means that tempurature, geophysical conditions, and slope (and to a smaller extend catchment length, area, and elevation) are the most important variables. Out of the three PCs, PC1 (tempurature variables), are by far the most important with a effect size of -65.17 (Std. error 0.88). PCs 2 (geophysical conditions) and 4 (slope) have effect sizes of 17.78 (Std. error 1.35) and -24.04 (Std. error 2.27) respectively. 

# R squared
r.squaredGLMM(m_pca)
##            R2m       R2c
## [1,] 0.8712126 0.8712126
# The R² is 87.12 %

plot(m_pca)

# the residual vs fitted plot shows a mostly flat and even distribution of points, with some outlying point at intermidiate values. The QQ-plot has skewed tails at both the lower and upper bounds and is substaintial for larger values


dwplot(m_pca) +
   geom_vline(xintercept = 0, colour = "grey60", linetype = 2)

ggcoefstats(x = m_pca, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that mean elevation and mean annual temperature are the two most influencial variables at predicting ELA. These are then followed by percipitation in non-monsoon phases, aswell as the temperature vairability varable that I aggrigated from mean temperature data. 

dELA model

m_pca_d <- glm(dELA ~ PC1 + PC2 + PC3 + PC4 + PC5, family = "poisson", data = pca_scores_df)
summary(m_pca)
## 
## Call:
## glm(formula = ELA ~ PC1 + PC2 + PC3 + PC4 + PC5, data = pca_scores_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -365.98   -41.14    -3.62    41.31   562.01  
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 5104.9305     2.8600 1784.941  < 2e-16 ***
## PC1          -65.1737     0.8836  -73.757  < 2e-16 ***
## PC2           17.7810     1.3500   13.171  < 2e-16 ***
## PC3            4.7192     1.6573    2.848  0.00451 ** 
## PC4          -24.0376     2.2665  -10.606  < 2e-16 ***
## PC5           -4.2597     2.7743   -1.535  0.12506    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 6944.475)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  5854193  on 843  degrees of freedom
## AIC: 9927.3
## 
## Number of Fisher Scoring iterations: 2
# R squared
r.squaredGLMM(m_pca_d)
## Warning: The null model is correct only if all variables used by the original
## model remain unchanged.
##                 R2m       R2c
## delta     0.7644043 0.7644043
## lognormal 0.7652960 0.7652960
## trigamma  0.7635058 0.7635058
# The R² is 76.44 %

plot(m_pca_d)

# the residual vs fitted plot shows a mostly flat and even distribution of points, with some outlying point at intermidiate values


# Variable effect size visualization
dwplot(m_pca_d) +
   geom_vline(xintercept = 0, colour = "grey60", linetype = 2)

ggcoefstats(x = m_pca_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that mean elevation and mean annual temperature are the two most influencial variables at predicting ELA. These are then followed by percipitation in non-monsoon phases, aswell as the temperature vairability varable that I aggrigated from mean temperature data. 

Automated variable selection

For the automated variable variable I focused on variables that I previously found to be strong predictors of ELA, aswell as the de-agrigated forms of temperature and precipitation metrics. While potential interaction between temperature and precipitation variables are likely to exist, these were not included in the saturated model, as just including a reduced list of variables with no interactions, already brought my computer to the edges of its computational limits.

# create semi-saturated model, with the broader variables based on my novice guess. These variables where selected based on a priori assumptions on what I thought would be  will be more relavent to include, based on the outputs of my previous models and exploratory analysis
m_saturated <-
  glm( ELA ~ debris_cov  + morph_type + orientation + length  + P_snow + P_year + P_not_monsoon +                    T_mean_mea.yr + T_mean_monsoon +   T_mean_not_monsoon + Slope_min + Elev_min + Elev_max + Elev_mean, # + T_MIN_mean_monsoon + T_MIN_mean_not_monsoon + T_MAX_mean_monsoon  T_MAX_mean_not_monsoon ; taken out due to computational constraints. Additionally, to get the model to run, area, max/mean slope, t_variability and monsoon precipition also needed to be removed. I chose these variables, as in previous models and in explotatory analysis they tended have low impacts on ELA.
  data = glacier_scaled,
  family = "gaussian") 

summary(m_saturated)
## 
## Call:
## glm(formula = ELA ~ debris_cov + morph_type + orientation + length + 
##     P_snow + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Elev_min + Elev_max + Elev_mean, 
##     family = "gaussian", data = glacier_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -319.00   -22.28    -0.55    21.49   273.08  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.087e+03  8.798e+00 578.204  < 2e-16 ***
## debris_cov          1.875e+01  7.928e+00   2.365 0.018255 *  
## morph_type         -2.507e-01  2.733e+00  -0.092 0.926953    
## orientationN        1.495e+01  5.812e+00   2.572 0.010297 *  
## orientationNE       1.537e+01  6.553e+00   2.345 0.019255 *  
## orientationNW       2.370e+01  7.226e+00   3.280 0.001081 ** 
## orientationS        2.711e+01  6.794e+00   3.991 7.17e-05 ***
## orientationSE       6.481e+00  8.088e+00   0.801 0.423155    
## orientationSW       2.608e+01  7.460e+00   3.496 0.000498 ***
## orientationW        2.775e+01  6.691e+00   4.147 3.72e-05 ***
## length             -8.992e+00  3.391e+00  -2.652 0.008158 ** 
## P_snow              6.372e+00  3.453e+00   1.845 0.065359 .  
## P_year              1.664e+01  6.409e+00   2.597 0.009567 ** 
## P_not_monsoon      -3.239e+01  7.727e+00  -4.192 3.06e-05 ***
## T_mean_mea.yr      -4.231e+05  1.411e+07  -0.030 0.976093    
## T_mean_monsoon      1.597e+05  5.329e+06   0.030 0.976091    
## T_mean_not_monsoon  2.646e+05  8.830e+06   0.030 0.976097    
## Slope_min           5.444e+00  1.985e+00   2.742 0.006237 ** 
## Elev_min            1.803e+02  5.926e+00  30.421  < 2e-16 ***
## Elev_max            8.835e+01  4.591e+00  19.244  < 2e-16 ***
## Elev_mean           6.351e+00  7.194e+00   0.883 0.377567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2401.556)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  1988489  on 828  degrees of freedom
## AIC: 9040.6
## 
## Number of Fisher Scoring iterations: 2
options(na.action = "na.fail") # Required for dredge to run
# Run all possible model permutations and tank according to AIC values
m_dredge <- dredge(m_saturated, beta = F, evaluate = T, rank = AICc)
## Fixed term is "(Intercept)"
options(na.action = "na.omit") # set back to default
head(m_dredge)
## Global model call: glm(formula = ELA ~ debris_cov + morph_type + orientation + length + 
##     P_snow + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Elev_min + Elev_max + Elev_mean, 
##     family = "gaussian", data = glacier_scaled)
## ---
## Model selection table 
##       (Int) dbr_cov Elv_max Elv_men Elv_min    lng orn P_not_mns P_snw P_yer
## 10204  5086   19.26   90.23           183.9 -9.266   +    -30.19 5.733 15.09
## 9948   5086   19.37   91.32           184.9 -9.292   +    -24.51       15.51
## 8156   5086   19.04   90.97           183.6 -9.328   +    -32.47 6.361 16.73
## 14300  5086   19.04   90.97           183.6 -9.328   +    -32.47 6.361 16.73
## 12252  5086   19.04   90.97           183.6 -9.328   +    -32.47 6.361 16.73
## 10208  5086   19.03   88.16   4.689   181.4 -9.070   +    -29.66 5.595 14.72
##       Slp_min T_men_mea.yr T_men_mns T_men_not_mns df    logLik   AICc delta
## 10204   5.427                               -28.51 18 -4498.965 9034.8  0.00
## 9948    5.395                               -30.98 17 -4500.478 9035.7  0.94
## 8156    5.498       -60.91     33.41               19 -4498.702 9036.3  1.57
## 14300   5.498                  10.41        -38.10 19 -4498.702 9036.3  1.57
## 12252   5.498        27.58                  -55.36 19 -4498.702 9036.3  1.57
## 10208   5.348                               -26.91 19 -4498.733 9036.4  1.63
##       weight
## 10204  0.291
## 9948   0.182
## 8156   0.133
## 14300  0.133
## 12252  0.133
## 10208  0.129
## Models ranked by AICc(x)
nrow(m_dredge) # 16384 models in total
## [1] 16384
top_model <- get.models(m_dredge, subset = 1)[[1]]
top_model
## 
## Call:  glm(formula = ELA ~ debris_cov + Elev_max + Elev_min + length + 
##     orientation + P_not_monsoon + P_snow + P_year + Slope_min + 
##     T_mean_not_monsoon + 1, family = "gaussian", data = glacier_scaled)
## 
## Coefficients:
##        (Intercept)          debris_cov            Elev_max            Elev_min  
##           5086.266              19.258              90.232             183.873  
##             length        orientationN       orientationNE       orientationNW  
##             -9.266              14.967              15.421              23.393  
##       orientationS       orientationSE       orientationSW        orientationW  
##             27.646               7.058              26.234              27.895  
##      P_not_monsoon              P_snow              P_year           Slope_min  
##            -30.188               5.733              15.093               5.427  
## T_mean_not_monsoon  
##            -28.514  
## 
## Degrees of Freedom: 848 Total (i.e. Null);  832 Residual
## Null Deviance:       45690000 
## Residual Deviance: 1992000   AIC: 9034
# Summarize top model
summary(top_model)
## 
## Call:
## glm(formula = ELA ~ debris_cov + Elev_max + Elev_min + length + 
##     orientation + P_not_monsoon + P_snow + P_year + Slope_min + 
##     T_mean_not_monsoon + 1, family = "gaussian", data = glacier_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -319.02   -22.06    -0.35    21.54   274.51  
## 
## Coefficients:
##                    Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)        5086.266      4.483 1134.445  < 2e-16 ***
## debris_cov           19.258      7.854    2.452 0.014408 *  
## Elev_max             90.232      3.345   26.971  < 2e-16 ***
## Elev_min            183.873      4.494   40.917  < 2e-16 ***
## length               -9.266      3.227   -2.871 0.004192 ** 
## orientationN         14.967      5.797    2.582 0.009994 ** 
## orientationNE        15.421      6.537    2.359 0.018550 *  
## orientationNW        23.393      7.206    3.246 0.001216 ** 
## orientationS         27.646      6.726    4.110 4.34e-05 ***
## orientationSE         7.058      8.052    0.877 0.381004    
## orientationSW        26.234      7.412    3.539 0.000424 ***
## orientationW         27.895      6.616    4.216 2.76e-05 ***
## P_not_monsoon       -30.188      6.948   -4.345 1.57e-05 ***
## P_snow                5.733      3.327    1.723 0.085203 .  
## P_year               15.093      5.881    2.566 0.010454 *  
## Slope_min             5.427      1.950    2.783 0.005505 ** 
## T_mean_not_monsoon  -28.514      4.442   -6.419 2.31e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2393.755)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  1991604  on 832  degrees of freedom
## AIC: 9033.9
## 
## Number of Fisher Scoring iterations: 2
# psudo R^2
r.squaredGLMM(top_model) # 95.74%
##            R2m       R2c
## [1,] 0.9556106 0.9556106
plot(top_model)

# the residual vs fitted plot shows a mostly flat and even distribution of points, with greater diviances seen at the lower spectrum of points, likely due to the negative skew described above

qqnorm(resid(top_model))
qqline(resid(top_model))

# When checking the QQ-plot it is seen that there is a left tail at lower bonds and a right tail at the upper bonds. Following an ad hoc transformation of the dependant variable ELA, this observed trend only showed a very marginal improvment, so I decided to keep ELA in its untransformed state.  


# Variable effect size visualization
dwplot(top_model) +
   geom_vline(xintercept = 0, colour = "grey60", linetype = 2)

# The automated model section based on AIC ranking of 16384 models, indicated that min and max elevation have are the two most important variables to predict ELA(with effect sizes 183.87 and 90.23 respectily & with small std.errors of 4.49 and 3.34), with large effect sizes relative to other variables. The other next most important variables where the various orientation directions, aswell as preciption and temperature in non monsoon periods. A noteable ommison is mean yearly temperature, which in the glmer model had the highest effect size. This could be attributed to high colinearity between mean temperature and temperature in non monsoon periods, but with temperature in non monsoon periods have a large effect size and therefor being prioritized in the selection. 


ggcoefstats(x = top_model, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that mean elevation and mean annual temperature are the two most influencial variables at predicting ELA. These are then followed by percipitation in non-monsoon phases, aswell as the temperature vairability varable that I aggrigated from mean temperature data. 

Q2. Can we explain ELA changes since the Little Ice Age with topographical and climatological variables?

m_saturated_d <-
  glm( dELA ~ debris_cov  + morph_type + orientation + length + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon +   T_mean_not_monsoon + Slope_min + Slope_mean + Elev_min + Elev_max + Elev_mean, # + T_MIN_mean_monsoon + T_MIN_mean_not_monsoon + T_MAX_mean_monsoon  T_MAX_mean_not_monsoon ; taken out due to computational constraints. Additionally, to get the model to run, area, max slope, t_variability snow, and monsoon precipition also needed to be removed. I chose these variables, as in previous models and in explotatory analysis they tended have low impacts on ELA.
  data = glacier_scaled,
  family = "gaussian") 

summary(m_saturated_d)
## 
## Call:
## glm(formula = dELA ~ debris_cov + morph_type + orientation + 
##     length + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Slope_mean + Elev_min + 
##     Elev_max + Elev_mean, family = "gaussian", data = glacier_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -117.94   -34.45    -6.80    27.19   337.05  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.027e+02  9.506e+00  10.807  < 2e-16 ***
## debris_cov         -2.581e+01  8.558e+00  -3.015 0.002645 ** 
## morph_type         -2.464e+00  2.954e+00  -0.834 0.404599    
## orientationN       -3.077e+00  6.298e+00  -0.489 0.625318    
## orientationNE      -1.553e+00  7.068e+00  -0.220 0.826119    
## orientationNW      -1.135e+00  7.819e+00  -0.145 0.884588    
## orientationS        2.305e+01  7.359e+00   3.133 0.001794 ** 
## orientationSE       4.858e+00  8.729e+00   0.557 0.577990    
## orientationSW       2.228e+01  8.080e+00   2.758 0.005948 ** 
## orientationW        1.725e+01  7.216e+00   2.390 0.017068 *  
## length              5.503e+00  3.808e+00   1.445 0.148752    
## P_year              6.092e+00  6.936e+00   0.878 0.380065    
## P_not_monsoon      -7.430e+00  7.174e+00  -1.036 0.300616    
## T_mean_mea.yr       7.542e+06  1.520e+07   0.496 0.619925    
## T_mean_monsoon     -2.847e+06  5.739e+06  -0.496 0.619923    
## T_mean_not_monsoon -4.718e+06  9.510e+06  -0.496 0.619928    
## Slope_min          -3.628e+00  2.484e+00  -1.461 0.144496    
## Slope_mean          1.233e+01  2.911e+00   4.236 2.53e-05 ***
## Elev_min            2.941e+01  6.422e+00   4.580 5.37e-06 ***
## Elev_max            1.907e+01  5.151e+00   3.702 0.000228 ***
## Elev_mean          -1.581e+01  7.868e+00  -2.010 0.044805 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2796.718)
## 
##     Null deviance: 2869609  on 848  degrees of freedom
## Residual deviance: 2315682  on 828  degrees of freedom
## AIC: 9169.9
## 
## Number of Fisher Scoring iterations: 2
options(na.action = "na.fail") # Required for dredge to run
# Run all possible model permutations and tank according to AIC values
model_dredge_d <- dredge(m_saturated_d, beta = F, evaluate = T, rank = AICc)
## Fixed term is "(Intercept)"
options(na.action = "na.omit") # set back to default
head(model_dredge_d)
## Global model call: glm(formula = dELA ~ debris_cov + morph_type + orientation + 
##     length + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Slope_mean + Elev_min + 
##     Elev_max + Elev_mean, family = "gaussian", data = glacier_scaled)
## ---
## Model selection table 
##       (Int) dbr_cov Elv_max Elv_men Elv_min   lng orn Slp_men Slp_min
## 7760  96.29  -24.74   22.42  -16.50   27.21         +   11.41  -4.532
## 13904 96.29  -24.74   22.42  -16.50   27.21         +   11.41  -4.532
## 11856 96.29  -24.74   22.42  -16.50   27.21         +   11.41  -4.532
## 7776  96.03  -26.20   19.95  -16.09   29.74 4.846   +   12.45  -4.007
## 13920 96.03  -26.20   19.95  -16.09   29.74 4.846   +   12.45  -4.007
## 11872 96.03  -26.20   19.95  -16.09   29.74 4.846   +   12.45  -4.007
##       T_men_mea.yr T_men_mns T_men_not_mns df    logLik   AICc delta weight
## 7760         75.91    -52.35               17 -4565.181 9165.1  0.00  0.177
## 13904                 -23.69         47.49 17 -4565.181 9165.1  0.00  0.177
## 11856       -62.75                   86.74 17 -4565.181 9165.1  0.00  0.177
## 7776         74.48    -50.60               18 -4564.261 9165.3  0.25  0.156
## 13920                 -22.48         46.59 18 -4564.261 9165.3  0.25  0.156
## 11872       -59.56                   83.85 18 -4564.261 9165.3  0.25  0.156
## Models ranked by AICc(x)
nrow(model_dredge_d) # 16384 models in total
## [1] 16384
top_model_d <- get.models(model_dredge_d, subset = 1)[[1]]
top_model_d
## 
## Call:  glm(formula = dELA ~ debris_cov + Elev_max + Elev_mean + Elev_min + 
##     orientation + Slope_mean + Slope_min + T_mean_mea.yr + T_mean_monsoon + 
##     1, family = "gaussian", data = glacier_scaled)
## 
## Coefficients:
##    (Intercept)      debris_cov        Elev_max       Elev_mean        Elev_min  
##         96.293         -24.740          22.417         -16.498          27.209  
##   orientationN   orientationNE   orientationNW    orientationS   orientationSE  
##         -3.414          -2.425          -1.927          23.096           4.273  
##  orientationSW    orientationW      Slope_mean       Slope_min   T_mean_mea.yr  
##         21.730          16.140          11.413          -4.532          75.911  
## T_mean_monsoon  
##        -52.346  
## 
## Degrees of Freedom: 848 Total (i.e. Null);  833 Residual
## Null Deviance:       2870000 
## Residual Deviance: 2328000   AIC: 9164
# Summarize top model
summary(top_model_d)
## 
## Call:
## glm(formula = dELA ~ debris_cov + Elev_max + Elev_mean + Elev_min + 
##     orientation + Slope_mean + Slope_min + T_mean_mea.yr + T_mean_monsoon + 
##     1, family = "gaussian", data = glacier_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -114.17   -35.46    -8.50    27.54   335.72  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      96.293      4.841  19.891  < 2e-16 ***
## debris_cov      -24.740      8.383  -2.951  0.00325 ** 
## Elev_max         22.417      4.702   4.768 2.20e-06 ***
## Elev_mean       -16.498      7.856  -2.100  0.03603 *  
## Elev_min         27.209      6.091   4.467 9.03e-06 ***
## orientationN     -3.414      6.140  -0.556  0.57838    
## orientationNE    -2.425      6.976  -0.348  0.72822    
## orientationNW    -1.927      7.791  -0.247  0.80473    
## orientationS     23.096      7.209   3.204  0.00141 ** 
## orientationSE     4.273      8.691   0.492  0.62306    
## orientationSW    21.730      7.900   2.751  0.00608 ** 
## orientationW     16.140      7.180   2.248  0.02484 *  
## Slope_mean       11.413      2.788   4.094 4.65e-05 ***
## Slope_min        -4.532      2.419  -1.873  0.06137 .  
## T_mean_mea.yr    75.911     23.609   3.215  0.00135 ** 
## T_mean_monsoon  -52.346     23.250  -2.251  0.02462 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2794.487)
## 
##     Null deviance: 2869609  on 848  degrees of freedom
## Residual deviance: 2327807  on 833  degrees of freedom
## AIC: 9164.4
## 
## Number of Fisher Scoring iterations: 2
# psudo R^2
r.squaredGLMM(top_model_d)  # 83.01%
##            R2m       R2c
## [1,] 0.1860884 0.1860884
plot(top_model_d)

# the residual vs fitted plot shows a mostly flat and even distribution of points, with greater diviances seen at the lower spectrum of points, likely due to the negative skew described above


# When checking the QQ-plot it is seen that there is a left tail at lower bonds and a right tail at the upper bonds. Following an ad hoc transformation of the dependant variable ELA, this observed trend only showed a very marginal improvment, so I decided to keep ELA in its untransformed state.  


# Variable effect size visualization
dwplot(top_model_d) +
   geom_vline(xintercept = 0, colour = "grey60", linetype = 2)

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can  see that mean elevation and mean annual temperature are the two most influencial variables at predicting dELA (although the are also accompandied by large standard erros). These are then followed by percipitation the varioua available elevation variables, aswell as the debris cover, which was already found to be relevent in predicting dELA in the hirearchical GLM and basian models. 


ggcoefstats(x = top_model_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

# In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that mean elevation and mean annual temperature are the two most influencial variables at predicting ELA. These are then followed by percipitation in non-monsoon phases, aswell as the temperature vairability varable that I aggrigated from mean temperature data. 

Compairisons and conclusions

AICc_df <- AICc(m_lme4, m_lme4_d, m_bays, m_bays_d, m_pca, m_pca_d, top_model, top_model_d) %>% 
  arrange(AICc)

AICc_df
##             df      AICc
## top_model   18  9034.754
## top_model_d 17  9165.099
## m_bays_d    15  9203.665
## m_lme4      15  9586.421
## m_bays      15  9651.466
## m_pca        7  9927.470
## m_lme4_d    14 26667.863
## m_pca_d      6 28475.896
# Based on a compaired ranking between the modeles contructed under my three approaches, it is seen that automated variable selection performed best for the current ELA, while the lme4 had the lowrst AICc for dELA.